Pronunciation-Enhanced Chinese Word Embedding
نویسندگان
چکیده
Abstract Chinese word embeddings have recently garnered considerable attention. characters and their sub-character components, which contain rich semantic information, are incorporated to learn embeddings. can represent a combination of meaning, structure, pronunciation. However, existing embedding learning methods focus on the structure meaning characters. In this study, we aim develop an method that make complete use information represented by characters, including phonology, morphology, semantics. Specifically, propose pronunciation-enhanced method, where pronunciations context target simultaneously encoded into Evaluation similarity, analogy reasoning, text classification, sentiment analysis validate effectiveness our proposed method.
منابع مشابه
Chinese Textual Entailment Recognition Enhanced with Word Embedding
Textual entailment has been proposed as a unifying generic framework for modeling language variability and semantic inference in different Natural Language Processing (NLP) tasks. By evaluating on NTCIR-11 RITE3 Simplified Chinese subtask data set, this paper firstly demonstrates and compares the performance of Chinese textual entailment recognition models that combine different lexical, syntac...
متن کاملCategory Enhanced Word Embedding
Distributed word representations have been demonstrated to be effective in capturing semantic and syntactic regularities. Unsupervised representation learning from large unlabeled corpora can learn similar representations for those words that present similar cooccurrence statistics. Besides local occurrence statistics, global topical information is also important knowledge that may help discrim...
متن کاملMulti-Granularity Chinese Word Embedding
This paper considers the problem of learning Chinese word embeddings. In contrast to English, a Chinese word is usually composed of characters, and most of the characters themselves can be further divided into components such as radicals. While characters and radicals contain rich information and are capable of indicating semantic meanings of words, they have not been fully exploited by existin...
متن کاملMorpheme-Enhanced Spectral Word Embedding
Traditional word embedding models only learn word-level semantic information from corpus while neglect the valuable semantic information of words’ internal structures such as morphemes. To address this problem, the goal of this paper is to exploit the morphological information to enhance the quality of word embeddings. Based on spectral method, we propose two word embedding models: Morpheme on ...
متن کاملRadical-Enhanced Chinese Character Embedding
We present a method to leverage radical for learning Chinese character embedding. Radical is a semantic and phonetic component of Chinese character. It plays an important role as characters with the same radical usually have similar semantic meaning and grammatical usage. However, existing Chinese processing algorithms typically regard word or character as the basic unit but ignore the crucial ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Cognitive Computation
سال: 2021
ISSN: ['1866-9964', '1866-9956']
DOI: https://doi.org/10.1007/s12559-021-09850-9